Overview

Dataset statistics

Number of variables23
Number of observations35353
Missing cells327813
Missing cells (%)40.3%
Total size in memory6.2 MiB
Average record size in memory184.0 B

Variable types

Text13
Unsupported8
Numeric2

Alerts

Binding has constant value ""Constant
TRA_leader has 638 (1.8%) missing valuesMissing
TRB_leader has 1063 (3.0%) missing valuesMissing
Linker has 35353 (100.0%) missing valuesMissing
Link_order has 35353 (100.0%) missing valuesMissing
TRA_5_prime_seq has 35353 (100.0%) missing valuesMissing
TRA_3_prime_seq has 35353 (100.0%) missing valuesMissing
TRB_5_prime_seq has 35353 (100.0%) missing valuesMissing
TRB_3_prime_seq has 35353 (100.0%) missing valuesMissing
Linked_nt has 35353 (100.0%) missing valuesMissing
Linked_aa has 35353 (100.0%) missing valuesMissing
Score has 8583 (24.3%) missing valuesMissing
MHC A has 1984 (5.6%) missing valuesMissing
MHC B has 32713 (92.5%) missing valuesMissing
Linker is an unsupported type, check if it needs cleaning or further analysisUnsupported
Link_order is an unsupported type, check if it needs cleaning or further analysisUnsupported
TRA_5_prime_seq is an unsupported type, check if it needs cleaning or further analysisUnsupported
TRA_3_prime_seq is an unsupported type, check if it needs cleaning or further analysisUnsupported
TRB_5_prime_seq is an unsupported type, check if it needs cleaning or further analysisUnsupported
TRB_3_prime_seq is an unsupported type, check if it needs cleaning or further analysisUnsupported
Linked_nt is an unsupported type, check if it needs cleaning or further analysisUnsupported
Linked_aa is an unsupported type, check if it needs cleaning or further analysisUnsupported
Score has 25266 (71.5%) zerosZeros

Reproduction

Analysis started2024-04-10 13:01:27.032422
Analysis finished2024-04-10 13:01:30.384468
Duration3.35 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

TRAV
Text

Distinct106
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size276.3 KiB
2024-04-10T13:01:30.506682image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length15
Median length13
Mean length10.14035584
Min length5

Characters and Unicode

Total characters358492
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)< 0.1%

Sample

1st rowTRAV26-1*01
2nd rowTRAV20*01
3rd rowTRAV38-2/DV8*01
4th rowTRAV26-1*01
5th rowTRAV20*01
ValueCountFrequency (%)
trav12-2*01 2641
 
7.5%
trav19*01 1815
 
5.1%
trav12-1*01 1751
 
5.0%
trav14/dv4*01 1570
 
4.4%
trav1-2*01 1539
 
4.4%
trav13-1*01 1509
 
4.3%
trav21*01 1492
 
4.2%
trav29/dv5*01 1490
 
4.2%
trav35*01 1457
 
4.1%
trav17*01 1427
 
4.0%
Other values (95) 18662
52.8%
2024-04-10T13:01:30.838332image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 58051
16.2%
V 39893
11.1%
0 36212
10.1%
T 35353
9.9%
A 35353
9.9%
R 35353
9.9%
* 34804
9.7%
2 22095
 
6.2%
- 16539
 
4.6%
3 9704
 
2.7%
Other values (9) 35135
9.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 358492
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 58051
16.2%
V 39893
11.1%
0 36212
10.1%
T 35353
9.9%
A 35353
9.9%
R 35353
9.9%
* 34804
9.7%
2 22095
 
6.2%
- 16539
 
4.6%
3 9704
 
2.7%
Other values (9) 35135
9.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 358492
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 58051
16.2%
V 39893
11.1%
0 36212
10.1%
T 35353
9.9%
A 35353
9.9%
R 35353
9.9%
* 34804
9.7%
2 22095
 
6.2%
- 16539
 
4.6%
3 9704
 
2.7%
Other values (9) 35135
9.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 358492
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 58051
16.2%
V 39893
11.1%
0 36212
10.1%
T 35353
9.9%
A 35353
9.9%
R 35353
9.9%
* 34804
9.7%
2 22095
 
6.2%
- 16539
 
4.6%
3 9704
 
2.7%
Other values (9) 35135
9.8%

TRAJ
Text

Distinct104
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size276.3 KiB
2024-04-10T13:01:30.996189image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length9
Median length9
Mean length8.844991938
Min length5

Characters and Unicode

Total characters312697
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowTRAJ43*01
2nd rowTRAJ28*01
3rd rowTRAJ40*01
4th rowTRAJ43*01
5th rowTRAJ28*01
ValueCountFrequency (%)
traj42*01 2819
 
8.0%
traj43*01 1241
 
3.5%
traj45*01 1192
 
3.4%
traj20*01 1160
 
3.3%
traj49*01 1158
 
3.3%
traj52*01 1156
 
3.3%
traj33*01 1154
 
3.3%
traj40*01 1101
 
3.1%
traj30*01 1066
 
3.0%
traj34*01 1032
 
2.9%
Other values (94) 22274
63.0%
2024-04-10T13:01:31.286693image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 40997
13.1%
0 39334
12.6%
T 35353
11.3%
R 35353
11.3%
A 35353
11.3%
J 35353
11.3%
* 34800
11.1%
4 13687
 
4.4%
3 12370
 
4.0%
2 12165
 
3.9%
Other values (5) 17932
5.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 312697
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 40997
13.1%
0 39334
12.6%
T 35353
11.3%
R 35353
11.3%
A 35353
11.3%
J 35353
11.3%
* 34800
11.1%
4 13687
 
4.4%
3 12370
 
4.0%
2 12165
 
3.9%
Other values (5) 17932
5.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 312697
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 40997
13.1%
0 39334
12.6%
T 35353
11.3%
R 35353
11.3%
A 35353
11.3%
J 35353
11.3%
* 34800
11.1%
4 13687
 
4.4%
3 12370
 
4.0%
2 12165
 
3.9%
Other values (5) 17932
5.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 312697
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 40997
13.1%
0 39334
12.6%
T 35353
11.3%
R 35353
11.3%
A 35353
11.3%
J 35353
11.3%
* 34800
11.1%
4 13687
 
4.4%
3 12370
 
4.0%
2 12165
 
3.9%
Other values (5) 17932
5.7%
Distinct21940
Distinct (%)62.1%
Missing0
Missing (%)0.0%
Memory size276.3 KiB
2024-04-10T13:01:31.476700image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length30
Median length25
Mean length13.58422199
Min length4

Characters and Unicode

Total characters480243
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15593 ?
Unique (%)44.1%

Sample

1st rowCIVRAPGRADMRF
2nd rowCAVPSGAGSYQLTF
3rd rowCAYRPPGTYKYIF
4th rowCIVRAPGRADMRF
5th rowCAVPSGAGSYQLTF
ValueCountFrequency (%)
caglnyggsqgnlif 188
 
0.5%
cagqnyggsqgnlif 172
 
0.5%
caigpgnmltf 151
 
0.4%
caasetsydkvif 150
 
0.4%
cagmnyggsqgnlif 134
 
0.4%
cavdlmktsydkvif 124
 
0.4%
cagggsqgnlif 107
 
0.3%
cadsgggadgltf 103
 
0.3%
camrrpissgsarqltf 102
 
0.3%
cavrdsnyqliw 64
 
0.2%
Other values (21930) 34058
96.3%
2024-04-10T13:01:31.797439image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
G 59175
12.3%
A 51661
10.8%
F 41942
 
8.7%
L 36078
 
7.5%
C 34974
 
7.3%
S 33230
 
6.9%
N 30943
 
6.4%
T 28588
 
6.0%
V 24667
 
5.1%
K 21524
 
4.5%
Other values (14) 117461
24.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 480243
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
G 59175
12.3%
A 51661
10.8%
F 41942
 
8.7%
L 36078
 
7.5%
C 34974
 
7.3%
S 33230
 
6.9%
N 30943
 
6.4%
T 28588
 
6.0%
V 24667
 
5.1%
K 21524
 
4.5%
Other values (14) 117461
24.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 480243
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
G 59175
12.3%
A 51661
10.8%
F 41942
 
8.7%
L 36078
 
7.5%
C 34974
 
7.3%
S 33230
 
6.9%
N 30943
 
6.4%
T 28588
 
6.0%
V 24667
 
5.1%
K 21524
 
4.5%
Other values (14) 117461
24.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 480243
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
G 59175
12.3%
A 51661
10.8%
F 41942
 
8.7%
L 36078
 
7.5%
C 34974
 
7.3%
S 33230
 
6.9%
N 30943
 
6.4%
T 28588
 
6.0%
V 24667
 
5.1%
K 21524
 
4.5%
Other values (14) 117461
24.5%

TRBV
Text

Distinct124
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size276.3 KiB
2024-04-10T13:01:31.992405image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length15
Median length12
Mean length9.701835771
Min length5

Characters and Unicode

Total characters342989
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)< 0.1%

Sample

1st rowTRBV13*01
2nd rowTRBV13*01
3rd rowTRBV14*01
4th rowTRBV13*01
5th rowTRBV13*01
ValueCountFrequency (%)
trbv19*01 3545
 
10.0%
trbv20-1*01 2220
 
6.3%
trbv27*01 2174
 
6.1%
trbv7-9*01 2081
 
5.9%
trbv9*01 1660
 
4.7%
trbv4-1*01 1266
 
3.6%
trbv2*01 1198
 
3.4%
trbv6-5*01 1143
 
3.2%
trbv5-1*01 1022
 
2.9%
trbv28*01 1010
 
2.9%
Other values (112) 18034
51.0%
2024-04-10T13:01:32.311797image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 53964
15.7%
0 38053
11.1%
R 35354
10.3%
T 35353
10.3%
B 35353
10.3%
V 35353
10.3%
* 34302
10.0%
- 22764
6.6%
2 13769
 
4.0%
9 8736
 
2.5%
Other values (9) 29988
8.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 342989
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 53964
15.7%
0 38053
11.1%
R 35354
10.3%
T 35353
10.3%
B 35353
10.3%
V 35353
10.3%
* 34302
10.0%
- 22764
6.6%
2 13769
 
4.0%
9 8736
 
2.5%
Other values (9) 29988
8.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 342989
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 53964
15.7%
0 38053
11.1%
R 35354
10.3%
T 35353
10.3%
B 35353
10.3%
V 35353
10.3%
* 34302
10.0%
- 22764
6.6%
2 13769
 
4.0%
9 8736
 
2.5%
Other values (9) 29988
8.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 342989
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 53964
15.7%
0 38053
11.1%
R 35354
10.3%
T 35353
10.3%
B 35353
10.3%
V 35353
10.3%
* 34302
10.0%
- 22764
6.6%
2 13769
 
4.0%
9 8736
 
2.5%
Other values (9) 29988
8.7%

TRBJ
Text

Distinct27
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size276.3 KiB
2024-04-10T13:01:32.442019image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length10
Median length10
Mean length9.910898651
Min length7

Characters and Unicode

Total characters350380
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTRBJ1-5*01
2nd rowTRBJ1-5*01
3rd rowTRBJ2-1*01
4th rowTRBJ1-5*01
5th rowTRBJ1-5*01
ValueCountFrequency (%)
trbj2-7*01 5812
16.4%
trbj2-1*01 5738
16.2%
trbj1-2*01 4144
11.7%
trbj1-1*01 4078
11.5%
trbj2-3*01 3611
10.2%
trbj2-2*01 3250
9.2%
trbj2-5*01 2313
 
6.5%
trbj1-5*01 1952
 
5.5%
trbj1-4*01 1048
 
3.0%
trbj1-6*01 824
 
2.3%
Other values (17) 2583
7.3%
2024-04-10T13:01:32.687158image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 57240
16.3%
T 35353
10.1%
R 35353
10.1%
B 35353
10.1%
J 35353
10.1%
- 35353
10.1%
* 34303
9.8%
0 34303
9.8%
2 29831
8.5%
7 6034
 
1.7%
Other values (4) 11904
 
3.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 350380
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 57240
16.3%
T 35353
10.1%
R 35353
10.1%
B 35353
10.1%
J 35353
10.1%
- 35353
10.1%
* 34303
9.8%
0 34303
9.8%
2 29831
8.5%
7 6034
 
1.7%
Other values (4) 11904
 
3.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 350380
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 57240
16.3%
T 35353
10.1%
R 35353
10.1%
B 35353
10.1%
J 35353
10.1%
- 35353
10.1%
* 34303
9.8%
0 34303
9.8%
2 29831
8.5%
7 6034
 
1.7%
Other values (4) 11904
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 350380
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 57240
16.3%
T 35353
10.1%
R 35353
10.1%
B 35353
10.1%
J 35353
10.1%
- 35353
10.1%
* 34303
9.8%
0 34303
9.8%
2 29831
8.5%
7 6034
 
1.7%
Other values (4) 11904
 
3.4%
Distinct24154
Distinct (%)68.3%
Missing0
Missing (%)0.0%
Memory size276.3 KiB
2024-04-10T13:01:32.865933image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length26
Median length24
Mean length14.40797103
Min length6

Characters and Unicode

Total characters509365
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17895 ?
Unique (%)50.6%

Sample

1st rowCASSYLPGQGDHYSNQPQHF
2nd rowCASSFEPGQGFYSNQPQHF
3rd rowCASSALASLNEQFF
4th rowCASSYLPGQGDHYSNQPQHF
5th rowCASSFEPGQGFYSNQPQHF
ValueCountFrequency (%)
cassirssyeqyf 272
 
0.8%
casswgggshygytf 145
 
0.4%
cassfsgntgelff 136
 
0.4%
casslrdgseaff 97
 
0.3%
cassirsayeqyf 78
 
0.2%
csvdleanygytf 74
 
0.2%
cassvrssyeqyf 46
 
0.1%
cassygaggyneqff 45
 
0.1%
casrtglastdtqyf 40
 
0.1%
cassqdhrmggheklff 39
 
0.1%
Other values (24144) 34381
97.3%
2024-04-10T13:01:33.200865image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
S 81256
16.0%
A 52620
10.3%
F 52220
10.3%
G 51974
10.2%
C 34073
 
6.7%
T 33563
 
6.6%
Q 31804
 
6.2%
Y 31737
 
6.2%
E 29020
 
5.7%
L 21464
 
4.2%
Other values (11) 89634
17.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 509365
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 81256
16.0%
A 52620
10.3%
F 52220
10.3%
G 51974
10.2%
C 34073
 
6.7%
T 33563
 
6.6%
Q 31804
 
6.2%
Y 31737
 
6.2%
E 29020
 
5.7%
L 21464
 
4.2%
Other values (11) 89634
17.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 509365
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 81256
16.0%
A 52620
10.3%
F 52220
10.3%
G 51974
10.2%
C 34073
 
6.7%
T 33563
 
6.6%
Q 31804
 
6.2%
Y 31737
 
6.2%
E 29020
 
5.7%
L 21464
 
4.2%
Other values (11) 89634
17.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 509365
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 81256
16.0%
A 52620
10.3%
F 52220
10.3%
G 51974
10.2%
C 34073
 
6.7%
T 33563
 
6.6%
Q 31804
 
6.2%
Y 31737
 
6.2%
E 29020
 
5.7%
L 21464
 
4.2%
Other values (11) 89634
17.6%

TRA_leader
Text

MISSING 

Distinct51
Distinct (%)0.1%
Missing638
Missing (%)1.8%
Memory size276.3 KiB
2024-04-10T13:01:33.360374image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length18
Median length16
Mean length13.18479044
Min length11

Characters and Unicode

Total characters457710
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowTRAV26-1*01(L)
2nd rowTRAV20*01(L)
3rd rowTRAV38-2/DV8*01(L)
4th rowTRAV26-1*01(L)
5th rowTRAV20*01(L)
ValueCountFrequency (%)
trav12-2*01(l 2643
 
7.6%
trav19*01(l 1815
 
5.2%
trav12-1*01(l 1748
 
5.0%
trav13-1*01(l 1590
 
4.6%
trav14/dv4*01(l 1572
 
4.5%
trav1-2*01(l 1537
 
4.4%
trav21*01(l 1492
 
4.3%
trav29/dv5*01(l 1484
 
4.3%
trav35*01(l 1456
 
4.2%
trav17*01(l 1407
 
4.1%
Other values (41) 17971
51.8%
2024-04-10T13:01:33.759517image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 57658
12.6%
V 39186
8.6%
0 36107
7.9%
T 34715
7.6%
) 34715
7.6%
R 34715
7.6%
( 34715
7.6%
L 34715
7.6%
* 34715
7.6%
A 34715
7.6%
Other values (11) 81754
17.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 457710
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 57658
12.6%
V 39186
8.6%
0 36107
7.9%
T 34715
7.6%
) 34715
7.6%
R 34715
7.6%
( 34715
7.6%
L 34715
7.6%
* 34715
7.6%
A 34715
7.6%
Other values (11) 81754
17.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 457710
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 57658
12.6%
V 39186
8.6%
0 36107
7.9%
T 34715
7.6%
) 34715
7.6%
R 34715
7.6%
( 34715
7.6%
L 34715
7.6%
* 34715
7.6%
A 34715
7.6%
Other values (11) 81754
17.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 457710
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 57658
12.6%
V 39186
8.6%
0 36107
7.9%
T 34715
7.6%
) 34715
7.6%
R 34715
7.6%
( 34715
7.6%
L 34715
7.6%
* 34715
7.6%
A 34715
7.6%
Other values (11) 81754
17.9%

TRB_leader
Text

MISSING 

Distinct69
Distinct (%)0.2%
Missing1063
Missing (%)3.0%
Memory size276.3 KiB
2024-04-10T13:01:33.912986image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length18
Median length14
Mean length12.7951881
Min length11

Characters and Unicode

Total characters438747
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)< 0.1%

Sample

1st rowTRBV13*01(L)
2nd rowTRBV13*01(L)
3rd rowTRBV14*01(L)
4th rowTRBV13*01(L)
5th rowTRBV13*01(L)
ValueCountFrequency (%)
trbv19*01(l 3546
 
10.3%
trbv20-1*01(l 2218
 
6.5%
trbv27*01(l 2172
 
6.3%
trbv7-9*01(l 2081
 
6.1%
trbv9*01(l 1660
 
4.8%
trbv4-1*01(l 1266
 
3.7%
trbv2*01(l 1198
 
3.5%
trbv6-5*01(l 1143
 
3.3%
trbv5-1*01(l 1019
 
3.0%
trbv28*01(l 1009
 
2.9%
Other values (59) 16978
49.5%
2024-04-10T13:01:34.196121image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 53494
12.2%
0 37922
8.6%
R 34291
7.8%
T 34290
7.8%
L 34290
7.8%
( 34290
7.8%
) 34290
7.8%
* 34290
7.8%
V 34290
7.8%
B 34290
7.8%
Other values (11) 73010
16.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 438747
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 53494
12.2%
0 37922
8.6%
R 34291
7.8%
T 34290
7.8%
L 34290
7.8%
( 34290
7.8%
) 34290
7.8%
* 34290
7.8%
V 34290
7.8%
B 34290
7.8%
Other values (11) 73010
16.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 438747
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 53494
12.2%
0 37922
8.6%
R 34291
7.8%
T 34290
7.8%
L 34290
7.8%
( 34290
7.8%
) 34290
7.8%
* 34290
7.8%
V 34290
7.8%
B 34290
7.8%
Other values (11) 73010
16.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 438747
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 53494
12.2%
0 37922
8.6%
R 34291
7.8%
T 34290
7.8%
L 34290
7.8%
( 34290
7.8%
) 34290
7.8%
* 34290
7.8%
V 34290
7.8%
B 34290
7.8%
Other values (11) 73010
16.6%

Linker
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing35353
Missing (%)100.0%
Memory size276.3 KiB

Link_order
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing35353
Missing (%)100.0%
Memory size276.3 KiB

TRA_5_prime_seq
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing35353
Missing (%)100.0%
Memory size276.3 KiB

TRA_3_prime_seq
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing35353
Missing (%)100.0%
Memory size276.3 KiB

TRB_5_prime_seq
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing35353
Missing (%)100.0%
Memory size276.3 KiB

TRB_3_prime_seq
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing35353
Missing (%)100.0%
Memory size276.3 KiB

Linked_nt
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing35353
Missing (%)100.0%
Memory size276.3 KiB

Linked_aa
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing35353
Missing (%)100.0%
Memory size276.3 KiB
Distinct6219
Distinct (%)17.6%
Missing0
Missing (%)0.0%
Memory size276.3 KiB
2024-04-10T13:01:34.439820image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length3568
Median length6
Mean length404.7933132
Min length6

Characters and Unicode

Total characters14310658
Distinct characters73
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5429 ?
Unique (%)15.4%

Sample

1st row[None]
2nd row[None]
3rd row[None]
4th row[None]
5th row[None]
ValueCountFrequency (%)
the 187578
 
8.0%
for 154645
 
6.6%
allele 151076
 
6.4%
region 121735
 
5.2%
01 75403
 
3.2%
being 75401
 
3.2%
trb 68601
 
2.9%
tra 62114
 
2.6%
is 60004
 
2.6%
a 51473
 
2.2%
Other values (411) 1338065
57.0%
2024-04-10T13:01:34.853950image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2430202
17.0%
e 1843482
12.9%
l 942992
 
6.6%
i 889883
 
6.2%
a 739130
 
5.2%
r 722217
 
5.0%
o 653005
 
4.6%
t 596039
 
4.2%
n 585767
 
4.1%
d 431554
 
3.0%
Other values (63) 4476387
31.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 14310658
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2430202
17.0%
e 1843482
12.9%
l 942992
 
6.6%
i 889883
 
6.2%
a 739130
 
5.2%
r 722217
 
5.0%
o 653005
 
4.6%
t 596039
 
4.2%
n 585767
 
4.1%
d 431554
 
3.0%
Other values (63) 4476387
31.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 14310658
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2430202
17.0%
e 1843482
12.9%
l 942992
 
6.6%
i 889883
 
6.2%
a 739130
 
5.2%
r 722217
 
5.0%
o 653005
 
4.6%
t 596039
 
4.2%
n 585767
 
4.1%
d 431554
 
3.0%
Other values (63) 4476387
31.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 14310658
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2430202
17.0%
e 1843482
12.9%
l 942992
 
6.6%
i 889883
 
6.2%
a 739130
 
5.2%
r 722217
 
5.0%
o 653005
 
4.6%
t 596039
 
4.2%
n 585767
 
4.1%
d 431554
 
3.0%
Other values (63) 4476387
31.3%
Distinct1289
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Memory size276.3 KiB
2024-04-10T13:01:35.079988image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length33
Median length9
Mean length9.49497921
Min length7

Characters and Unicode

Total characters335676
Distinct characters20
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique367 ?
Unique (%)1.0%

Sample

1st rowFLKEKGGL
2nd rowFLKEKGGL
3rd rowFLKEKGGL
4th rowFLKEQGGL
5th rowFLKEQGGL
ValueCountFrequency (%)
klggalqak 13619
38.5%
gilgfvftl 2489
 
7.0%
avfdrksdak 1719
 
4.9%
rakfkqll 1212
 
3.4%
tfeyvsqpflmdle 976
 
2.8%
llwngpmav 840
 
2.4%
ylqprtfll 838
 
2.4%
sprwyfyyl 793
 
2.2%
ttdpsflgry 767
 
2.2%
ivtdfsvik 718
 
2.0%
Other values (1279) 11382
32.2%
2024-04-10T13:01:35.438619image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
L 57752
17.2%
A 42150
12.6%
G 40074
11.9%
K 37975
11.3%
Q 22033
 
6.6%
F 18420
 
5.5%
V 16621
 
5.0%
T 14648
 
4.4%
P 11504
 
3.4%
I 10427
 
3.1%
Other values (10) 64072
19.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 335676
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
L 57752
17.2%
A 42150
12.6%
G 40074
11.9%
K 37975
11.3%
Q 22033
 
6.6%
F 18420
 
5.5%
V 16621
 
5.0%
T 14648
 
4.4%
P 11504
 
3.4%
I 10427
 
3.1%
Other values (10) 64072
19.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 335676
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
L 57752
17.2%
A 42150
12.6%
G 40074
11.9%
K 37975
11.3%
Q 22033
 
6.6%
F 18420
 
5.5%
V 16621
 
5.0%
T 14648
 
4.4%
P 11504
 
3.4%
I 10427
 
3.1%
Other values (10) 64072
19.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 335676
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
L 57752
17.2%
A 42150
12.6%
G 40074
11.9%
K 37975
11.3%
Q 22033
 
6.6%
F 18420
 
5.5%
V 16621
 
5.0%
T 14648
 
4.4%
P 11504
 
3.4%
I 10427
 
3.1%
Other values (10) 64072
19.1%

Score
Real number (ℝ)

MISSING  ZEROS 

Distinct4
Distinct (%)< 0.1%
Missing8583
Missing (%)24.3%
Infinite0
Infinite (%)0.0%
Mean0.1078446022
Minimum0
Maximum3
Zeros25266
Zeros (%)71.5%
Negative0
Negative (%)0.0%
Memory size276.3 KiB
2024-04-10T13:01:35.583296image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum3
Range3
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.4827191399
Coefficient of variation (CV)4.476062132
Kurtosis23.11594445
Mean0.1078446022
Median Absolute Deviation (MAD)0
Skewness4.819080067
Sum2887
Variance0.2330177681
MonotonicityNot monotonic
2024-04-10T13:01:35.677761image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=4)
ValueCountFrequency (%)
0 25266
71.5%
1 569
 
1.6%
2 487
 
1.4%
3 448
 
1.3%
(Missing) 8583
 
24.3%
ValueCountFrequency (%)
0 25266
71.5%
1 569
 
1.6%
2 487
 
1.4%
3 448
 
1.3%
ValueCountFrequency (%)
3 448
 
1.3%
2 487
 
1.4%
1 569
 
1.6%
0 25266
71.5%

MHC A
Text

MISSING 

Distinct98
Distinct (%)0.3%
Missing1984
Missing (%)5.6%
Memory size276.3 KiB
2024-04-10T13:01:35.788947image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length20
Median length11
Mean length11.01417483
Min length6

Characters and Unicode

Total characters367532
Distinct characters23
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20 ?
Unique (%)0.1%

Sample

1st rowHLA-B*08
2nd rowHLA-B*08
3rd rowHLA-B*08
4th rowHLA-B*08
5th rowHLA-B*08
ValueCountFrequency (%)
hla-a*03:01 14287
42.8%
hla-a*02:01 8844
26.5%
hla-a*11:01 2489
 
7.5%
hla-a*01:01 1752
 
5.3%
hla-b*07:02 1584
 
4.7%
hla-b*08:01 1242
 
3.7%
hla-a*24:02 751
 
2.3%
hla-dqa1*05:01 345
 
1.0%
hla-a*02 247
 
0.7%
hla-b*15:01 234
 
0.7%
Other values (88) 1594
 
4.8%
2024-04-10T13:01:36.053498image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
A 62771
17.1%
0 61827
16.8%
1 38118
10.4%
H 33369
9.1%
L 33369
9.1%
- 33369
9.1%
* 33159
9.0%
: 32982
9.0%
3 14592
 
4.0%
2 12609
 
3.4%
Other values (13) 11367
 
3.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 367532
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
A 62771
17.1%
0 61827
16.8%
1 38118
10.4%
H 33369
9.1%
L 33369
9.1%
- 33369
9.1%
* 33159
9.0%
: 32982
9.0%
3 14592
 
4.0%
2 12609
 
3.4%
Other values (13) 11367
 
3.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 367532
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
A 62771
17.1%
0 61827
16.8%
1 38118
10.4%
H 33369
9.1%
L 33369
9.1%
- 33369
9.1%
* 33159
9.0%
: 32982
9.0%
3 14592
 
4.0%
2 12609
 
3.4%
Other values (13) 11367
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 367532
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
A 62771
17.1%
0 61827
16.8%
1 38118
10.4%
H 33369
9.1%
L 33369
9.1%
- 33369
9.1%
* 33159
9.0%
: 32982
9.0%
3 14592
 
4.0%
2 12609
 
3.4%
Other values (13) 11367
 
3.1%

MHC B
Text

MISSING 

Distinct51
Distinct (%)1.9%
Missing32713
Missing (%)92.5%
Memory size276.3 KiB
2024-04-10T13:01:36.182805image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length20
Median length14
Mean length12.78522727
Min length8

Characters and Unicode

Total characters33753
Distinct characters22
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)0.5%

Sample

1st rowHLA-DQB1*06:02
2nd rowHLA-DQB1*06:02
3rd rowHLA-DQB1*06:02
4th rowHLA-DRB1*15:03
5th rowHLA-DRB1*15:03
ValueCountFrequency (%)
hla-a*02:01 850
32.2%
hla-dpb1*04:01 594
22.5%
hla-dqb1*06:02 491
18.6%
hla-drb1*07:01 131
 
5.0%
hla-a*02 97
 
3.7%
hla-dqb1*02:01 86
 
3.3%
hla-drb1*04:01 81
 
3.1%
hla-drb1*15:01 57
 
2.2%
hla-drb3*03:01 38
 
1.4%
hla-a*24:02 32
 
1.2%
Other values (41) 183
 
6.9%
2024-04-10T13:01:36.438768image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 5048
15.0%
1 3711
11.0%
A 3628
10.7%
H 2640
7.8%
L 2640
7.8%
- 2640
7.8%
* 2640
7.8%
: 2577
7.6%
2 1662
 
4.9%
B 1648
 
4.9%
Other values (12) 4919
14.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 33753
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 5048
15.0%
1 3711
11.0%
A 3628
10.7%
H 2640
7.8%
L 2640
7.8%
- 2640
7.8%
* 2640
7.8%
: 2577
7.6%
2 1662
 
4.9%
B 1648
 
4.9%
Other values (12) 4919
14.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 33753
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 5048
15.0%
1 3711
11.0%
A 3628
10.7%
H 2640
7.8%
L 2640
7.8%
- 2640
7.8%
* 2640
7.8%
: 2577
7.6%
2 1662
 
4.9%
B 1648
 
4.9%
Other values (12) 4919
14.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 33753
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 5048
15.0%
1 3711
11.0%
A 3628
10.7%
H 2640
7.8%
L 2640
7.8%
- 2640
7.8%
* 2640
7.8%
: 2577
7.6%
2 1662
 
4.9%
B 1648
 
4.9%
Other values (12) 4919
14.6%
Distinct2
Distinct (%)< 0.1%
Missing8
Missing (%)< 0.1%
Memory size276.3 KiB
2024-04-10T13:01:36.536005image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length5
Median length4
Mean length4.077945961
Min length4

Characters and Unicode

Total characters144135
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMHCI
2nd rowMHCI
3rd rowMHCI
4th rowMHCI
5th rowMHCI
ValueCountFrequency (%)
mhci 32590
92.2%
mhcii 2755
 
7.8%
2024-04-10T13:01:36.735886image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
I 38100
26.4%
M 35345
24.5%
H 35345
24.5%
C 35345
24.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 144135
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
I 38100
26.4%
M 35345
24.5%
H 35345
24.5%
C 35345
24.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 144135
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
I 38100
26.4%
M 35345
24.5%
H 35345
24.5%
C 35345
24.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 144135
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
I 38100
26.4%
M 35345
24.5%
H 35345
24.5%
C 35345
24.5%

Binding
Real number (ℝ)

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1
Minimum1
Maximum1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size276.3 KiB
2024-04-10T13:01:36.843321image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile1
Maximum1
Range0
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0
Coefficient of variation (CV)0
Kurtosis0
Mean1
Median Absolute Deviation (MAD)0
Skewness0
Sum35353
Variance0
MonotonicityIncreasing
2024-04-10T13:01:36.938111image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=1)
ValueCountFrequency (%)
1 35353
100.0%
ValueCountFrequency (%)
1 35353
100.0%
ValueCountFrequency (%)
1 35353
100.0%